Improving the Cross-Lingual Projection of Syntactic Dependencies
نویسنده
چکیده
This paper presents several modifications of the standard annotation projection algorithm for syntactic structures in crosslingual dependency parsing. Our approach reduces projection noise and includes efficient data sub-set selection techniques that have a substantial impact on parser performance in terms of labeled attachment scores. We test our techniques on data from the Universal Dependency Treebank and demonstrate the improvements on a number of language pairs. We also look at treebank translation including syntaxbased models and data combination techniques that push the performance even further. We achieve absolute improvements of up to over seven points in labeled attachment scores pushing the state-of-the art in cross-lingual dependency parsing for all language pairs tested in our experiments.
منابع مشابه
Soft Cross-lingual Syntax Projection for Dependency Parsing
This paper proposes a simple yet effective framework of soft cross-lingual syntax projection to transfer syntactic structures from source language to target language using monolingual treebanks and large-scale bilingual parallel text. Here, soft means that we only project reliable dependencies to compose high-quality target structures. The projected instances are then used as additional trainin...
متن کاملCross-Lingual Syntactic Transfer with Limited Resources
We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available. The method makes use of three steps: 1) a method for deriving cross-lingual word clusters, that can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source...
متن کاملCross-Lingual Syntactically Informed Distributed Word Representations
We develop a novel cross-lingual word representation model which injects syntactic information through dependencybased contexts into a shared cross-lingual word vector space. The model, termed CLDEPEMB, is based on the following assumptions: (1) dependency relations are largely language-independent, at least for related languages and prominent dependency links such as direct objects, as evidenc...
متن کاملRelaxed Cross-lingual Projection of Constituent Syntax
We propose a relaxed correspondence assumption for cross-lingual projection of constituent syntax, which allows a supposed constituent of the target sentence to correspond to an unrestricted treelet in the source parse. Such a relaxed assumption fundamentally tolerates the syntactic non-isomorphism between languages, and enables us to learn the target-language-specific syntactic idiosyncrasy ra...
متن کاملCross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels
This paper presents cross-lingual models for dependency parsing using the first release of the universal dependencies data set. We systematically compare annotation projection with monolingual baseline models and study the effect of predicted PoS labels in evaluation. Our results reveal the strong impact of tagging accuracy especially with models trained on noisy projected data sets. This paper...
متن کامل